Scalable distributed speech recognition using Gaussian mixture model-based block quantisation
نویسندگان
چکیده
In this paper, we investigate the use of block quantisers based on Gaussian mixture models (GMMs) for the coding of Mel frequency-warped cepstral coefficient (MFCC) features in distributed speech recognition (DSR) applications. Specifically, we consider the multi-frame scheme, where temporal correlation across MFCC frames is exploited by the Karhunen–Loève transform of the block quantiser. Compared with vector quantisers, the GMM-based block quantiser has relatively low computational and memory requirements which are independent of bitrate. More importantly, it is bitrate scalable, which means that the bitrate can be adjusted without the need for re-training. Static parameters such as the GMM and transform matrices are stored at the encoder and decoder and bit allocations are calculated on-the-fly without intensive processing. We have evaluated the quantisation scheme on the Aurora-2 database in a DSR framework. We show that jointly quantising more frames and using more mixture components in the GMM leads to higher recognition performance. The multi-frame GMM-based block quantiser achieves a word error rate (WER) of 2.5% at 800 bps, which is less than 1% degradation from the baseline (unquantised) word recognition accuracy, and graceful degradation down to a WER of 7% at 300 bps. 2005 Elsevier B.V. All rights reserved.
منابع مشابه
Scalable distributed speech recognition using multi-frame GMM-based block quantization
In this paper, we propose the use of the multi-frame Gaussian mixture model-based block quantizer for the coding of Mel frequencywarped cepstral coefficient (MFCC) features in distributed speech recognition (DSR) applications. This coding scheme exploits intraframe correlation via the Karhunen-Loéve transform (KLT) and interframe correlation via the joint processing of adjacent frames together ...
متن کاملRecognizing the Emotional State Changes in Human Utterance by a Learning Statistical Method based on Gaussian Mixture Model
Speech is one of the most opulent and instant methods to express emotional characteristics of human beings, which conveys the cognitive and semantic concepts among humans. In this study, a statistical-based method for emotional recognition of speech signals is proposed, and a learning approach is introduced, which is based on the statistical model to classify internal feelings of the utterance....
متن کاملSpeech Enhancement using Laplacian Mixture Model under Signal Presence Uncertainty
In this paper an estimator for speech enhancement based on Laplacian Mixture Model has been proposed. The proposed method, estimates the complex DFT coefficients of clean speech from noisy speech using the MMSE estimator, when the clean speech DFT coefficients are supposed mixture of Laplacians and the DFT coefficients of noise are assumed zero-mean Gaussian distribution. Furthermore, the MMS...
متن کاملMulti-frame GMM-based block quantisation of line spectral frequencies
In this paper, we investigate the use of the Gaussian mixture model-based block quantiser for coding line spectral frequencies that uses multiple frames and mean squared error as the quantiser selection criterion. As a viable alternative to vector quantisers, the GMM-based block quantiser encompasses both low computational and memory requirements as well as bitrate scalability. Jointly quantisi...
متن کاملSubspace Gaussian Mixture Models for Large Vocabulary Speech Recognition
Subspace Gaussian mixture model(GMM) is an alternative approach to approximate the probabilistic density function (p.d.f) of a set of independent identical distributed (i.i.d) data with prior density estimates. In this approach, the prior density of GMM parameters is estimated from a development dataset, and when predict the new enrolled data, the prior knowledge can be utilised by criteria lik...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Speech Communication
دوره 48 شماره
صفحات -
تاریخ انتشار 2006